[1] "sdev" "loadings" "center" "scale" "n.obs" "scores" "call"
Rodney Dyer, PhD
There are many times when we have several columns of data recorded on indiviudal observations.
Some of the consequences of this is that we may have problems:
Are there methods for visualization and quantification of data like this?
A method to factor high dimensional data into additive subcomponents
Just like you can factor the equation -6x^2 + 5x + 4 = 0 into the factors (2x+1)(-3x+4), large data sets with N rows and K columns of data can be factored based upon their column-wise mean values, variances, and covariances between columns of data.
Consider the matrix of data X with N rows and K columns. The variance of each of the K data columns and their covariances, can be represented as an KxK covariance matrix and is derived from this fancy formula.
S = X'[X'X]^{-1}X
S = \left[ \begin{array}{cccc} \sigma_A^2 & \sigma_{AB}^2 & \ldots & \sigma_{AK}^2 \\ \sigma_{BA}^2 & \sigma_{B}^2 & \ldots & \sigma_{AK}^2 \\ \sigma_{CA}^2 & \sigma_{BC}^2 & \ddots & \sigma_{AK}^2 \\ \vdots & \vdots & \vdots & \vdots \\ \sigma_{KA}^2 & \sigma_{KD}^2 & \ldots & \sigma_{K}^2 \\ \end{array}\right]
So we can partition this matrix as:
S = \sum_{i=1}^K \lambda_{i} \ell^\prime_i \ell_i
Where:
\lambda_i is a scaling number, and
\ell_i is a 1xK vector of values.
Consider the following data
The transformation you are doing is based upon applying a linear transformation of the original data from its previous coordinate space into an identically sized new coordinate space.
[1] "sdev" "loadings" "center" "scale" "n.obs" "scores" "call"
Importance of components:
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
Standard deviation 2.8113021 2.19949725 1.98692071 1.76188725 1.35153653
Proportion of Variance 0.1362659 0.08341014 0.06806645 0.05352149 0.03149398
Cumulative Proportion 0.1362659 0.21967599 0.28774244 0.34126393 0.37275792
Comp.6 Comp.7 Comp.8 Comp.9 Comp.10
Standard deviation 1.3052912 1.24832072 1.23585320 1.20816941 1.16573700
Proportion of Variance 0.0293756 0.02686732 0.02633333 0.02516678 0.02343005
Cumulative Proportion 0.4021335 0.42900084 0.45533417 0.48050095 0.50393100
Comp.11 Comp.12 Comp.13 Comp.14 Comp.15
Standard deviation 1.1479296 1.12805147 1.11077635 1.09681605 1.07210090
Proportion of Variance 0.0227197 0.02193966 0.02127283 0.02074147 0.01981725
Cumulative Proportion 0.5266507 0.54859035 0.56986318 0.59060466 0.61042190
Comp.16 Comp.17 Comp.18 Comp.19 Comp.20
Standard deviation 1.06907461 1.06258841 1.05051763 1.03671883 1.0197660
Proportion of Variance 0.01970553 0.01946714 0.01902737 0.01853079 0.0179297
Cumulative Proportion 0.63012743 0.64959457 0.66862194 0.68715273 0.7050824
Comp.21 Comp.22 Comp.23 Comp.24 Comp.25
Standard deviation 1.00291893 0.9913325 0.9779643 0.96869623 0.95675236
Proportion of Variance 0.01734218 0.0169438 0.0164899 0.01617883 0.01578233
Cumulative Proportion 0.72242461 0.7393684 0.7558583 0.77203714 0.78781947
Comp.26 Comp.27 Comp.28 Comp.29 Comp.30
Standard deviation 0.94329924 0.93880305 0.91823416 0.89529206 0.87049453
Proportion of Variance 0.01534161 0.01519571 0.01453714 0.01381979 0.01306484
Cumulative Proportion 0.80316108 0.81835679 0.83289393 0.84671372 0.85977856
Just like working on raw data, but coalescing all the individuals into single populations defined by allele frquency matrices.
Stratum AML-01 AML-02 AML-03 AML-04 AML-05 AML-06 AML-07 AML-08 AML-09
1 101 0.00 0 0 0.0000000 0.0000000 0.00 0.00 0.50 0.00
2 102 0.00 0 0 0.0000000 0.0000000 0.00 0.00 0.00 0.00
3 12 0.05 0 0 0.0000000 0.0000000 0.05 0.35 0.50 0.00
4 153 0.00 0 0 0.0000000 0.0000000 0.00 0.60 0.35 0.05
5 156 0.00 0 0 0.6666667 0.3333333 0.00 0.00 0.00 0.00
6 157 0.00 0 0 0.7000000 0.1000000 0.20 0.00 0.00 0.00
AML-10 AML-11 AML-12 AML-13 ATPS-01 ATPS-02 ATPS-03 ATPS-04 ATPS-05
1 0.00 0.5 0 0 0 0.6666667 0.0000000 0.1111111 0.00
2 0.00 1.0 0 0 0 0.9375000 0.0000000 0.0000000 0.00
3 0.05 0.0 0 0 0 0.0000000 0.0000000 0.0000000 1.00
4 0.00 0.0 0 0 0 0.0000000 0.0000000 0.0000000 1.00
5 0.00 0.0 0 0 0 0.0000000 0.9166667 0.0000000 0.00
6 0.00 0.0 0 0 0 0.0000000 0.7000000 0.0000000 0.15
Just like working on raw data, but coalescing all the individuals into single populations defined by allele frquency matrices.
Importance of components:
PC1 PC2 PC3 PC4 PC5 PC6 PC7
Standard deviation 0.9961 0.7719 0.5274 0.39514 0.2898 0.25543 0.24065
Proportion of Variance 0.4135 0.2483 0.1159 0.06508 0.0350 0.02719 0.02414
Cumulative Proportion 0.4135 0.6618 0.7777 0.84282 0.8778 0.90501 0.92914
PC8 PC9 PC10 PC11 PC12 PC13 PC14
Standard deviation 0.19880 0.15834 0.14287 0.13783 0.12481 0.10191 0.09150
Proportion of Variance 0.01647 0.01045 0.00851 0.00792 0.00649 0.00433 0.00349
Cumulative Proportion 0.94562 0.95606 0.96457 0.97249 0.97898 0.98331 0.98680
PC15 PC16 PC17 PC18 PC19 PC20 PC21
Standard deviation 0.08413 0.07641 0.07166 0.05890 0.05077 0.03845 0.03744
Proportion of Variance 0.00295 0.00243 0.00214 0.00145 0.00107 0.00062 0.00058
Cumulative Proportion 0.98975 0.99218 0.99432 0.99577 0.99685 0.99746 0.99805
PC22 PC23 PC24 PC25 PC26 PC27 PC28
Standard deviation 0.03216 0.02974 0.02461 0.02256 0.01880 0.01789 0.01682
Proportion of Variance 0.00043 0.00037 0.00025 0.00021 0.00015 0.00013 0.00012
Cumulative Proportion 0.99848 0.99884 0.99910 0.99931 0.99946 0.99959 0.99971
PC29 PC30 PC31 PC32 PC33 PC34
Standard deviation 0.01469 0.01358 0.01061 0.007838 0.006974 0.005382
Proportion of Variance 0.00009 0.00008 0.00005 0.000030 0.000020 0.000010
Cumulative Proportion 0.99980 0.99987 0.99992 0.999950 0.999970 0.999980
PC35 PC36 PC37 PC38 PC39
Standard deviation 0.004322 0.003937 0.003217 0.002008 4.423e-16
Proportion of Variance 0.000010 0.000010 0.000000 0.000000 0.000e+00
Cumulative Proportion 0.999990 0.999990 1.000000 1.000000 1.000e+00
Just like working on raw data, but coalescing all the individuals into single populations defined by allele frquency matrices.
Like PCA but using distance matrices instead of raw data.
[1] 39 39
Importance of components:
PC1 PC2 PC3 PC4 PC5 PC6 PC7
Standard deviation 3.4963 2.3244 1.38995 0.76870 0.62286 0.5129 0.4473
Proportion of Variance 0.5622 0.2485 0.08884 0.02717 0.01784 0.0121 0.0092
Cumulative Proportion 0.5622 0.8106 0.89946 0.92664 0.94448 0.9566 0.9658
PC8 PC9 PC10 PC11 PC12 PC13 PC14
Standard deviation 0.39332 0.31379 0.26270 0.23524 0.20290 0.1976 0.18482
Proportion of Variance 0.00711 0.00453 0.00317 0.00254 0.00189 0.0018 0.00157
Cumulative Proportion 0.97289 0.97742 0.98059 0.98314 0.98503 0.9868 0.98839
PC15 PC16 PC17 PC18 PC19 PC20 PC21
Standard deviation 0.18292 0.16247 0.14794 0.14137 0.13605 0.12182 0.11651
Proportion of Variance 0.00154 0.00121 0.00101 0.00092 0.00085 0.00068 0.00062
Cumulative Proportion 0.98993 0.99115 0.99215 0.99307 0.99392 0.99461 0.99523
PC22 PC23 PC24 PC25 PC26 PC27 PC28
Standard deviation 0.11066 0.1039 0.10234 0.09489 0.08724 0.08436 0.07748
Proportion of Variance 0.00056 0.0005 0.00048 0.00041 0.00035 0.00033 0.00028
Cumulative Proportion 0.99579 0.9963 0.99677 0.99719 0.99754 0.99786 0.99814
PC29 PC30 PC31 PC32 PC33 PC34 PC35
Standard deviation 0.07707 0.07387 0.06873 0.06740 0.06523 0.06105 0.05838
Proportion of Variance 0.00027 0.00025 0.00022 0.00021 0.00020 0.00017 0.00016
Cumulative Proportion 0.99841 0.99866 0.99888 0.99909 0.99929 0.99946 0.99961
PC36 PC37 PC38 PC39
Standard deviation 0.05684 0.05523 0.04602 3.946e-16
Proportion of Variance 0.00015 0.00014 0.00010 0.000e+00
Cumulative Proportion 0.99976 0.99990 1.00000 1.000e+00
A technique to build a representation of similarity between objects.
Supervised
Unsupervised
Individual or Group Based
Help File for hclust
Requires that the matrix objects actually be turned into dist objects (which are matrix objects with constraints).
101 102 12 153 156 157
102 2.048994
12 3.972442 4.342952
153 4.099369 4.364062 1.860651
156 4.727214 4.754565 4.901142 4.871141
157 4.541334 4.629884 4.510097 4.532973 1.073274
159 3.733735 4.047019 2.537027 3.302282 4.434527 4.121070
Call:
hclust(d = d)
Cluster method : complete
Distance : euclidean
Number of objects: 39